Searching RDF Graphs with SPARQL and Keywords

نویسندگان

  • Shady Elbassuoni
  • Maya Ramanath
  • Ralf Schenkel
  • Gerhard Weikum
چکیده

The proliferation of knowledge-sharing communities like Wikipedia and the advances in automated information extraction from Web pages enable the construction of large knowledge bases with facts about entities and their relationships. The facts can be represented in the RDF data model, as so-called subject-property-object triples, and can thus be queried by structured query languages like SPARQL. In principle, this allows precise querying in the database spirit. However, RDF data may be highly diverse and queries may return way too many results, so that ranking by informativeness measures is crucial to avoid overwhelming users. Moreover, as facts are extracted from textual contexts or have community-provided annotations, it can be beneficial to consider also keywords for formulating search requests. This paper gives an overview of recent and ongoing work on ranked retrieval of RDF data with keyword-augmented structured queries. The ranking method is based on statistical language models, the state-of-the-art paradigm in information retrieval. The paper develops a novel form of language models for the structured, but schema-less setting of RDF triples and extended SPARQL queries. 1 Motivation and Background Entity-Relationship graphs are receiving great attention for information management outside of mainstream database engines. In particular, the Semantic-Web data model RDF (Resource Description Format) is gaining popularity for applications on scientific data such as biological networks [14], social Web2.0 applications [4], large-scale knowledge bases such as DBpedia [2] or YAGO [13], and more generally, as a light-weight representation for the “Web of data” [5]. An RDF data collection consists of a set of subject-property-object triples, SPO triples for short. In ER terminology, an SPO triple corresponds to a pair of entities connected by a named relationship or to an entity connected to the value of a named attribute. As the object of a triple can in turn be the subject of other triples, we can also view the RDF data as a graph of typed nodes and typed edges where nodes correspond to entities and edges to relationships (viewing attributes as relations as well). Some of the existing RDF collections contain more than a billion triples. As a simple example that we will use throughout the paper, consider a Web portal on movies. Table 1 shows a few sample triples. The example illustrates a number of specific requirements that RDF data poses for querying: Copyright 2010 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Searching and Generating Authoring Information: A Hybrid Approach

In this paper, the authors propose a novel approach to search and retrieve authoring information from online authoring databases. The proposed approach combines keywords and semantic-based methods. In this approach, the user can retrieve such information considering some specified keywords and ignore how the internal semantic search is being processed. The keywords entered by the user are inter...

متن کامل

Extending SPARQL with regular expression patterns (for querying RDF)

RDF is a knowledge representation language dedicated to the annotation of resourceswithin the framework of the semantic web. Among the query languages for RDF, SPARQLallows querying RDF through graph patterns, i.e., RDF graphs involving variables. Otherlanguages, inspired by the work in databases, use regular expressions for searching pathsin RDF graphs. Each approach can expres...

متن کامل

Keyword Search on RDF Graphs: It Is More Than Just Searching for Keywords

In this paper, we propose a model for enabling users to search RDF data via keywords, thus, allowing them to discover relevant information without using complicated queries or knowing the underlying ontology or vocabulary. We aim at exploiting the characteristics of the RDF data to increase the quality of the ranked query results. We consider different dimensions for evaluating the value of res...

متن کامل

Using Patterns for Keyword Search in RDF Graphs

An increasing number of RDF datasets are available on the Web. Querying RDF data requires the knowledge of a query language such as SPARQL; it also requires some information describing the content of these datasets. The goal of our work is to facilitate the querying of RDF datasets, and we present an approach for enabling users to search in RDF data using keywords. We introduce the notion of pa...

متن کامل

k-nearest keyword search in RDF graphs

Resource Description Framework (RDF) has been widely used as a W3C standard to describe the resource information in the Semantic Web. A standard SPARQL query over RDF data requires query issuers to fully understand the domain knowledge of the data. Because of this fact, SPARQL queries over RDF data are not flexible and it is difficult for non-experts to create queries without knowing the underl...

متن کامل

A Tool for Efficiently Processing SPARQL Queries on RDF Quads

We present a tool called RIQ (RDF Indexing on Quads) for efficiently processing SPARQL queries on large RDF datasets containing quads. RIQ’s novel design includes: (a) a vector representation of RDF graphs for efficient indexing, (b) a filtering index for efficiently organizing similar RDF graphs, and (c) a decrease-and-conquer strategy for efficient query processing using the filtering index t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Data Eng. Bull.

دوره 33  شماره 

صفحات  -

تاریخ انتشار 2010